175 research outputs found

    C-Pack: Packaged Resources To Advance General Chinese Embedding

    Full text link
    We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding

    Probabilistic hesitant fuzzy multiple attribute decisionmaking based on regret theory for the evaluation of venture capital projects

    Get PDF
    The selection of venture capital investment projects is one of the most important decision-making activities for venture capitalists. Due to the complexity of investment market and the limited cognition of people, most of the venture capital investment decision problems are highly uncertain and the venture capitalists are often bounded rational under uncertainty. To address such problems, this article presents an approach based on regret theory to probabilistic hesitant fuzzy multiple attribute decision-making. Firstly, when the information on the occurrence probabilities of all the elements in the probabilistic hesitant fuzzy element (P.H.F.E.) is unknown or partially known, two different mathematical programming models based on water-filling theory and the maximum entropy principle are provided to handle these complex situations. Secondly, to capture the psychological behaviours of venture capitalists, the regret theory is utilised to solve the problem of selection of venture capital investment projects. Finally, comparative analysis with the existing approaches is conducted to demonstrate the feasibility and applicability of the proposed method

    Wasserstein distance-based probabilistic linguistic TODIM method with application to the evaluation of sustainable rural tourism potential

    Get PDF
    The evaluation of sustainable rural tourism potential is a key work in sustainable rural tourism development. Due to the complexity of the rural tourism development situation and the limited cognition of people, most of the assessment problems for sustainable rural tourism potential are highly uncertain, which brings challenges to the characterisation and measurement of evaluation information. Besides, decision-makers (DMs) usually do not exhibit complete rationality in the practical evaluation process. To tackle such problems, this paper proposes a new behaviour multi-attribute group decision-making (MAGDM) method with probabilistic linguistic terms sets (PLTSs) by integrating Wasserstein distance measure into TODIM (an acronym in Portuguese of interactive and multicriteria decision making) method. Firstly, a new Wasserstein-based distance measure with PLTSs is defined, and some properties of the proposed distance are developed. Secondly, based on the correlation coefficient among attributes and standard deviation of each attribute, an attribute weight determination method (called PL-CRITIC method) is proposed. Subsequently, a Wasserstein distance-based probabilistic linguistic TODIM method is developed. Finally, the proposed method is applied to the evaluation of sustainable rural tourism potential, along with sensitivity and comparative analyses, as a means of illustrating the effectiveness and advantages of the new method

    Retrieve Anything To Augment Large Language Models

    Full text link
    Large language models (LLMs) face significant challenges stemming from their inherent limitations in knowledge, memory, alignment, and action. These challenges cannot be addressed by LLMs alone, but should rely on assistance from the external world, such as knowledge base, memory store, demonstration examples, and tools. Retrieval augmentation stands as a vital mechanism for bridging the gap between LLMs and the external assistance. However, conventional methods encounter two pressing issues. On the one hand, the general-purpose retrievers are not properly optimized for the retrieval augmentation of LLMs. On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios. In this work, we present a novel approach, the LLM-Embedder, which comprehensively supports the diverse retrieval augmentation needs of LLMs with one unified embedding model. Training such a unified model is non-trivial, as various retrieval tasks aim to capture distinct semantic relationships, often subject to mutual interference. To address this challenge, we systematically optimize our training methodology. This includes reward formulation based on LLMs' feedback, the stabilization of knowledge distillation, multi-task fine-tuning with explicit instructions, and homogeneous in-batch negative sampling. These optimization strategies contribute to the outstanding empirical performance of the LLM-Embedder. Notably, it yields remarkable enhancements in retrieval augmentation for LLMs, surpassing both general-purpose and task-specific retrievers in various evaluation scenarios. Our checkpoint and source code are publicly available at https://github.com/FlagOpen/FlagEmbedding
    • ā€¦
    corecore